在众多计算机视觉应用中,评估非刚性形状的相似性是一项基本任务。在这里,我们提出了一种新型的公理方法,以匹配跨形状的相似区域。匹配相似区域被配制为与Laplace-Beltrami操作员(LBO)密切相关的操作员的对齐。所提出方法的主要新颖性是考虑具有多个指标的多种歧管上定义的差分运算符。指标的选择与基本形状属性有关,同时考虑不同指标下的同一歧管,可以将其视为从不同角度分析了基本歧管。具体而言,我们检查了标准不变的度量和相应的尺度不变的拉普拉斯 - 贝特拉米操作员(Si-LBO)以及常规度量和常规LBO。我们证明,规模不变的度量强调了铰接形状中重要语义特征的位置。因此,Si-LBO的截断光谱更好地捕获了局部弯曲的区域,并补充了常规LBO截断光谱中封装的全局信息。我们表明,在标准基准测试时,将这些双光谱匹配的公理框架优于竞争的公理框架。我们介绍了一个新的数据集,并将所提出的方法与跨数据库配置中的基于最先进的学习方法进行了比较。具体而言,我们表明,在对一个数据集进行培训并在另一个数据集上进行测试时,提出的不涉及培训的公理方法优于深度学习替代方案。
translated by 谷歌翻译
深度估计是需要对环境的3D评估的广大应用程序的基石,例如机器人,增强现实和自主驱动来命名几个。深度估计的一个突出技术是立体声匹配,其具有多种优点:它被认为比其他深度传感技术更容易进入,可以实时产生密集的深度估计,并从近年来深度学习的进步中受益匪浅。然而,用于立体图像的深度估计的当前技术仍然遭受内置缺点。为了重建深度,立体声匹配算法首先在应用几何三角测量之前估计左图像和右图像之间的视差图。一个简单的分析表明,深度误差与对象距离相当成比例。因此,恒定的差异误差被转换为远离相机的物体的大深度误差。为了缓解这种二次关系,我们提出了一种简单但有效的方法,使用细化网络进行深度估计。我们展示了分析和经验结果表明所提出的学习程序减少了这种二次关系。我们评估了众所周知的基准和数据集的提出的细化程序,如演唱者和基提数据集,并在深度精度度量中展示了显着的改进。
translated by 谷歌翻译
Most cross-domain unsupervised Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to the target domain. However, this requires laborious model-tuning by the end-user who may prefer to have a system that works ``out-of-the-box." To address such practical scenarios, we identify a novel target domain (inference-time) VAD task where no target domain training data are available. To this end, we propose a new `Zero-shot Cross-domain Video Anomaly Detection (zxvad)' framework that includes a future-frame prediction generative model setup. Different from prior future-frame prediction models, our model uses a novel Normalcy Classifier module to learn the features of normal event videos by learning how such features are different ``relatively" to features in pseudo-abnormal examples. A novel Untrained Convolutional Neural Network based Anomaly Synthesis module crafts these pseudo-abnormal examples by adding foreign objects in normal video frames with no extra training cost. With our novel relative normalcy feature learning strategy, zxvad generalizes and learns to distinguish between normal and abnormal frames in a new target domain without adaptation during inference. Through evaluations on common datasets, we show that zxvad outperforms the state-of-the-art (SOTA), regardless of whether task-relevant (i.e., VAD) source training data are available or not. Lastly, zxvad also beats the SOTA methods in inference-time efficiency metrics including the model size, total parameters, GPU energy consumption, and GMACs.
translated by 谷歌翻译
Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance.
translated by 谷歌翻译
Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Some existing approaches extract deep features from pre-trained networks and build a graph to apply classical clustering methods (e.g., $k$-means and normalized-cuts) as a post-processing stage. These techniques reduce the high-dimensional information encoded in the features to pair-wise scalar affinities. In this work, we replace classical clustering algorithms with a lightweight Graph Neural Network (GNN) trained to achieve the same clustering objective function. However, in contrast to existing approaches, we feed the GNN not only the pair-wise affinities between local image features but also the raw features themselves. Maintaining this connection between the raw feature and the clustering goal allows to perform part semantic segmentation implicitly, without requiring additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training our image segmentation GNN. Additionally, we use the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters ($k$-less clustering). We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.
translated by 谷歌翻译
In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.
translated by 谷歌翻译
Out-of-distribution (OOD) detection has attracted a large amount of attention from the machine learning research community in recent years due to its importance in deployed systems. Most of the previous studies focused on the detection of OOD samples in the multi-class classification task. However, OOD detection in the multi-label classification task remains an underexplored domain. In this research, we propose YolOOD - a method that utilizes concepts from the object detection domain to perform OOD detection in the multi-label classification task. Object detection models have an inherent ability to distinguish between objects of interest (in-distribution) and irrelevant objects (e.g., OOD objects) on images that contain multiple objects from different categories. These abilities allow us to convert a regular object detection model into an image classifier with inherent OOD detection capabilities with just minor changes. We compare our approach to state-of-the-art OOD detection methods and demonstrate YolOOD's ability to outperform these methods on a comprehensive suite of in-distribution and OOD benchmark datasets.
translated by 谷歌翻译
This is a continuation of our recent paper in which we developed the theory of sequential parametrized motion planning. A sequential parametrized motion planning algorithm produced a motion of the system which is required to visit a prescribed sequence of states, in a certain order, at specified moments of time. In the previous publication we analysed the sequential parametrized topological complexity of the Fadell - Neuwirth fibration which in relevant to the problem of moving multiple robots avoiding collisions with other robots and with obstacles in the Euclidean space. Besides, in the preceeding paper we found the sequential parametrised topological complexity of the Fadell - Neuwirth bundle for the case of the Euclidean space $\Bbb R^d$ of odd dimension as well as the case $d=2$. In the present paper we give the complete answer for an arbitrary $d\ge 2$ even. Moreover, we present an explicit motion planning algorithm for controlling multiple robots in $\Bbb R^d$ having the minimal possible topological complexity; this algorithm is applicable to any number $n$ of robots and any number $m\ge 2$ of obstacles.
translated by 谷歌翻译
Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions. This paper considers a more realistic yet more challenging scenario,namely Single Domain Generalization (Single-DG), where only a single source domain is available for training. To tackle this challenge, we first try to understand when neural networks fail to generalize? We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity". Based on our analysis, we propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies. Models trained with these hard-to-learn samples can effectively suppress the sensitivity in the frequency space, which leads to improved generalization performance. Extensive experiments on multiple public datasets demonstrate the superiority of our approach, which surpasses the state-of-the-art single-DG methods.
translated by 谷歌翻译
State-of-the-art object detectors are fast and accurate, but they require a large amount of well annotated training data to obtain good performance. However, obtaining a large amount of training annotations specific to a particular task, i.e., fine-grained annotations, is costly in practice. In contrast, obtaining common-sense relationships from text, e.g., "a table-lamp is a lamp that sits on top of a table", is much easier. Additionally, common-sense relationships like "on-top-of" are easy to annotate in a task-agnostic fashion. In this paper, we propose a probabilistic model that uses such relational knowledge to transform an off-the-shelf detector of coarse object categories (e.g., "table", "lamp") into a detector of fine-grained categories (e.g., "table-lamp"). We demonstrate that our method, RelDetect, achieves performance competitive to finetuning based state-of-the-art object detector baselines when an extremely low amount of fine-grained annotations is available ($0.2\%$ of entire dataset). We also demonstrate that RelDetect is able to utilize the inherent transferability of relationship information to obtain a better performance ($+5$ mAP points) than the above baselines on an unseen dataset (zero-shot transfer). In summary, we demonstrate the power of using relationships for object detection on datasets where fine-grained object categories can be linked to coarse-grained categories via suitable relationships.
translated by 谷歌翻译